173 research outputs found

    Recent development of antiSMASH and other computational approaches to mine secondary metabolite biosynthetic gene clusters

    Get PDF
    Many drugs are derived from small molecules produced by microorganisms and plants, so-called natural products. Natural products have diverse chemical structures, but the biosynthetic pathways producing those compounds are often organized as biosynthetic gene clusters (BGCs) and follow a highly conserved biosynthetic logic. This allows for the identification of core biosynthetic enzymes using genome mining strategies that are based on the sequence similarity of the involved enzymes/genes. However, mining for a variety of BGCs quickly approaches a complexity level where manual analyses are no longer possible and require the use of automated genome mining pipelines, such as the antiSMASH software. In this review, we discuss the principles underlying the predictions of antiSMASH and other tools and provide practical advice for their application. Furthermore, we discuss important caveats such as rule-based BGC detection, sequence and annotation quality and cluster boundary prediction, which all have to be considered while planning for, performing and analyzing the results of genome mining studies

    NRPSpredictor2-a web server for predicting NRPS adenylation domain specificity

    Get PDF
    The products of many bacterial non-ribosomal peptide synthetases (NRPS) are highly important secondary metabolites, including vancomycin and other antibiotics. The ability to predict substrate specificity of newly detected NRPS Adenylation (A-) domains by genome sequencing efforts is of great importance to identify and annotate new gene clusters that produce secondary metabolites. Prediction of A-domain specificity based on the sequence alone can be achieved through sequence signatures or, more accurately, through machine learning methods. We present an improved predictor, based on previous work (NRPSpredictor), that predicts A-domain specificity using Support Vector Machines on four hierarchical levels, ranging from gross physicochemical properties of an A-domain's substrates down to single amino acid substrates. The three more general levels are predicted with an F-measure better than 0.89 and the most detailed level with an average F-measure of 0.80. We also modeled the applicability domain of our predictor to estimate for new A-domains whether they lie in the applicability domain. Finally, since there are also NRPS that play an important role in natural products chemistry of fungi, such as peptaibols and cephalosporins, we added a predictor for fungal A-domains, which predicts gross physicochemical properties with an F-measure of 0.84. The service is available at http://nrps.informatik.uni-tuebingen.de/

    plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters

    Get PDF
    Plant specialized metabolites are chemically highly diverse, play key roles in host-microbe interactions, have important nutritional value in crops and are frequently applied as medicines. It has recently become clear that plant biosynthetic pathway-encoding genes are sometimes densely clustered in specific genomic loci: Biosynthetic gene clusters (BGCs). Here, we introduce plantiSMASH, a versatile online analysis platform that automates the identification of candidate plant BGCs. Moreover, it allows integration of transcriptomic data to prioritize candidate BGCs based on the coexpression patterns of predicted biosynthetic enzyme-coding genes, and facilitates comparative genomic analysis to study the evolutionary conservation of each cluster. Applied on 48 high-quality plant genomes, plantiSMASH identifies a rich diversity of candidate plant BGCs. These results will guide further experimental exploration of the nature and dynamics of gene clustering in plant metabolism. Moreover, spurred by the continuing decrease in costs of plant genome sequencing, they will allow genome mining technologies to be applied to plant natural product discovery.</p

    Exploration and exploitation of the environment for novel specialized metabolites

    Get PDF
    Microorganisms are Nature's little engineers of a remarkable array of bioactive small molecules that represent most of our new drugs. The wealth of genomic and metagenomic sequence data generated in the last decade has shown that the majority of novel biosynthetic gene clusters (BGCs) is identified from cultivation-independent studies, which has led to a strong expansion of the number of microbial taxa known to harbour BGCs. The large size and repeat sequences of BGCs remain a bioinformatic challenge, but newly developed software tools have been created to overcome these issues and are paramount to identify and select the most promising BGCs for further research and exploitation. Although heterologous expression of BGCs has been the greatest challenge until now, a growing number of polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS)-encoding gene clusters have been cloned and expressed in bacteria and fungi based on techniques that mostly rely on homologous recombination. Finally, combining ecological insights with state-of-the-art computation and molecular methodologies will allow for further comprehension and exploitation of microbial specialized metabolites

    Minimum Information about a Biosynthetic Gene cluster

    Get PDF
    A wide variety of enzymatic pathways that produce specialized metabolites in bacteria, fungi and plants are known to be encoded in biosynthetic gene clusters. Information about these clusters, pathways and metabolites is currently dispersed throughout the literature, making it difficult to exploit. To facilitate consistent and systematic deposition and retrieval of data on biosynthetic gene clusters, we propose the Minimum Information about a Biosynthetic Gene cluster (MIBiG) data standard.Netherlands Organization for Scientific Research (NWO)/Rubicon/825.13.001EU/FP7/Joint Call OCEANBiotechnology and Biological Sciences Research Council (BBSRC)Natural Environment Research Council (UK)National Institute for Energy Ethics and Society (NIEeS; UK)Gordon and Betty Moore FoundationNational Science Foundation (NSF; US)US Department of EnergyEngineering and Physical Sciences Research Council (EPSRC

    A predicted physicochemically distinct sub-proteome associated with the intracellular organelle of the anammox bacterium Kuenenia stuttgartiensis

    Get PDF
    Medema MH, Zhou M, van Hijum SAFT, et al. A predicted physicochemically distinct sub-proteome associated with the intracellular organelle of the anammox bacterium Kuenenia stuttgartiensis. BMC Genomics. 2010;11(1): 299.Background Anaerobic ammonium-oxidizing (anammox) bacteria perform a key step in global nitrogen cycling. These bacteria make use of an organelle to oxidize ammonia anaerobically to nitrogen (N2) and so contribute ~50% of the nitrogen in the atmosphere. It is currently unknown which proteins constitute the organellar proteome and how anammox bacteria are able to specifically target organellar and cell-envelope proteins to their correct final destinations. Experimental approaches are complicated by the absence of pure cultures and genetic accessibility. However, the genome of the anammox bacterium Candidatus "Kuenenia stuttgartiensis" has recently been sequenced. Here, we make use of these genome data to predict the organellar sub-proteome and address the molecular basis of protein sorting in anammox bacteria. Results Two training sets representing organellar (30 proteins) and cell envelope (59 proteins) proteins were constructed based on previous experimental evidence and comparative genomics. Random forest (RF) classifiers trained on these two sets could differentiate between organellar and cell envelope proteins with ~89% accuracy using 400 features consisting of frequencies of two adjacent amino acid combinations. A physicochemically distinct organellar sub-proteome containing 562 proteins was predicted with the best RF classifier. This set included almost all catabolic and respiratory factors encoded in the genome. Apparently, the cytoplasmic membrane performs no catabolic functions. We predict that the Tat-translocation system is located exclusively in the organellar membrane, whereas the Sec-translocation system is located on both the organellar and cytoplasmic membranes. Canonical signal peptides were predicted and validated experimentally, but a specific (N- or C-terminal) signal that could be used for protein targeting to the organelle remained elusive. Conclusions A physicochemically distinct organellar sub-proteome was predicted from the genome of the anammox bacterium K. stuttgartiensis. This result provides strong in silico support for the existing experimental evidence for the existence of an organelle in this bacterium, and is an important step forward in unravelling a geochemically relevant case of cytoplasmic differentiation in bacteria. The predicted dual location of the Sec-translocation system and the apparent absence of a specific N- or C-terminal signal in the organellar proteins suggests that additional chaperones may be necessary that act on an as-yet unknown property of the targeted proteins

    Linking genomics and metabolomics to chart specialized metabolic diversity

    Get PDF
    Microbial and plant specialized metabolites constitute an immense chemical diversity, and play key roles in mediating ecological interactions between organisms. Also referred to as natural products, they have been widely applied in medicine, agriculture, cosmetic and food industries. Traditionally, the main discovery strategies have centered around the use of activity-guided fractionation of metabolite extracts. Increasingly, omics data is being used to complement this, as it has the potential to reduce rediscovery rates, guide experimental work towards the most promising metabolites, and identify enzymatic pathways that enable their biosynthetic production. In recent years, genomic and metabolomic analyses of specialized metabolic diversity have been scaled up to study thousands of samples simultaneously. Here, we survey data analysis technologies that facilitate the effective exploration of large genomic and metabolomic datasets, and discuss various emerging strategies to integrate these two types of omics data in order to further accelerate discovery

    Genomic mutational analysis of the impact of the classical strain improvement program on β-lactam producing Penicillium chrysogenum

    Get PDF
    BACKGROUND: Penicillium chrysogenum is a filamentous fungus that is employed as an industrial producer of β-lactams. The high β-lactam titers of current strains is the result of a classical strain improvement program (CSI) starting with a wild-type like strain more than six decades ago. This involved extensive mutagenesis and strain selection for improved β-lactam titers and growth characteristics. However, the impact of the CSI on the secondary metabolism in general remains unknown. RESULTS: To examine the impact of CSI on secondary metabolism, a comparative genomic analysis of β-lactam producing strains was carried out by genome sequencing of three P. chrysogenum strains that are part of a lineage of the CSI, i.e., strains NRRL1951, Wisconsin 54-1255, DS17690, and the derived penicillin biosynthesis cluster free strain DS68530. CSI has resulted in a wide spread of mutations, that statistically did not result in an over- or underrepresentation of specific gene classes. However, in this set of mutations, 8 out of 31 secondary metabolite genes (20 polyketide synthases and 11 non-ribosomal peptide synthetases) were targeted with a corresponding and progressive loss in the production of a range of secondary metabolites unrelated to β-lactam production. Additionally, key Velvet complex proteins (LeaA and VelA) involved in global regulation of secondary metabolism have been repeatedly targeted for mutagenesis during CSI. Using comparative metabolic profiling, the polyketide synthetase gene cluster was identified that is responsible for sorbicillinoid biosynthesis, a group of yellow-colored metabolites that are abundantly produced by early production strains of P. chrysogenum. CONCLUSIONS: The classical industrial strain improvement of P. chrysogenum has had a broad mutagenic impact on metabolism and has resulted in silencing of specific secondary metabolite genes with the concomitant diversion of metabolism towards the production of β-lactams

    Metabolomics methods for the synthetic biology of secondary metabolism

    Get PDF
    Many microbial secondary metabolites are of high biotechnological value for medicine, agriculture, and the food industry. Bacterial genome mining has revealed numerous novel secondary metabolite biosynthetic gene clusters, which encode the potential to synthesize a large diversity of compounds that have never been observed before. The stimulation or “awakening” of this cryptic microbial secondary metabolism has naturally attracted the attention of synthetic microbiologists, who exploit recent advances in DNA sequencing and synthesis to achieve unprecedented control over metabolic pathways. One of the indispensable tools in the synthetic biology toolbox is metabolomics, the global quantification of small biomolecules. This review illustrates the pivotal role of metabolomics for the synthetic microbiology of secondary metabolism, including its crucial role in novel compound discovery in microbes, the examination of side products of engineered metabolic pathways, as well as the identification of major bottlenecks for the overproduction of compounds of interest, especially in combination with metabolic modeling. We conclude by highlighting remaining challenges and recent technological advances that will drive metabolomics towards fulfilling its potential as a cornerstone technology of synthetic microbiology

    antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline

    Get PDF
    Secondary metabolites produced by bacteria and fungi are an important source of antimicrobials and other bioactive compounds. In recent years, genome mining has seen broad applications in identifying and characterizing new compounds as well as in metabolic engineering. Since 2011, the 'antibiotics and secondary metabolite analysis shell-antiSMASH' (https://antismash.secondarymetabolites.org) has assisted researchers in this, both as a web server and a standalone tool. It has established itself as the most widely used tool for identifying and analysing biosynthetic gene clusters (BGCs) in bacterial and fungal genome sequences. Here, we present an entirely redesigned and extended version 5 of antiSMASH. antiSMASH 5 adds detection rules for clusters encoding the biosynthesis of acyl-amino acids, β-lactones, fungal RiPPs, RaS-RiPPs, polybrominated diphenyl ethers, C-nucleosides, PPY-like ketones and lipolanthines. For type II polyketide synthase-encoding gene clusters, antiSMASH 5 now offers more detailed predictions. The HTML output visualization has been redesigned to improve the navigation and visual representation of annotations. We have again improved the runtime of analysis steps, making it possible to deliver comprehensive annotations for bacterial genomes within a few minutes. A new output file in the standard JavaScript object notation (JSON) format is aimed at downstream tools that process antiSMASH results programmatically.</p
    corecore